BanglaLekha-Isolated: A Comprehensive Bangla Handwritten Character Dataset

نویسندگان

  • Mithun Biswas
  • Rafiqul Islam
  • Gautam Kumar Shom
  • Md Shopon
  • Nabeel Mohammed
  • Sifat Momen
  • Md Anowarul Abedin
چکیده

Bangla handwriting recognition is becoming a very important issue nowadays. It is potentially a very important task specially for Bangla speaking population of Bangladesh and West Bengal. By keeping that in our mind we are introducing a comprehensive Bangla handwritten character dataset named BanglaLekha-Isolated. This dataset contains Bangla handwritten numerals, basic characters and compound characters. This dataset was collected from multiple geographical location within Bangladesh and includes sample collected from a variety of aged groups. This dataset can also be used for other classification problems i.e: gender, age, district. This is the largest dataset on Bangla handwritten characters yet.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BanglaLekha-Isolated: A multi-purpose comprehensive dataset of Handwritten Bangla Isolated characters

BanglaLekha-Isolated, a Bangla handwritten isolated character dataset is presented in this article. This dataset contains 84 different characters comprising of 50 Bangla basic characters, 10 Bangla numerals and 24 selected compound characters. 2000 handwriting samples for each of the 84 characters were collected, digitized and pre-processed. After discarding mistakes and scribbles, 1,66,105 han...

متن کامل

Development of a Multi-User Recognition Engine for Handwritten Bangla Basic Characters and Digits

Abstract: The objective of the paper is to recognize handwritten samples of basic Bangla characters using Tesseract open source Optical Character Recognition (OCR) engine under Apache License 2.0. Handwritten data samples containing isolated Bangla basic characters and digits were collected from different users. Tesseract is trained with user-specific data samples of document pages to generate ...

متن کامل

Handwritten Bangla Character Recognition Using The State-of-Art Deep Convolutional Neural Networks

In spite of advances in object recognition technology, Handwritten Bangla Character Recognition (HBCR) remains largely unsolved due to the presence of many ambiguous handwritten characters and excessively cursive Bangla handwritings. Even the best existing recognizers do not lead to satisfactory performance for practical applications related to Bangla character recognition and have much lower p...

متن کامل

Recognition of Isolated Multi-Oriented Handwritten/Printed Characters using a Novel Convex-Hull Based Alignment Technique

Handwritten character recognition is one of the difficult tasks of pattern recognition due to diverse writing styles. The problem becomes more severe if the characters are written in a cursive fashion with varying orientations. Also there may exist printed characters of different shapes/fonts and sizes in a document image. In the current work, we have presented a novel convex hull based alignme...

متن کامل

Word Extraction and Character Segmentation from Text Lines of Unconstrained Handwritten Bangla Document Images

In this paper, a novel approach for word extraction and character segmentation from the handwritten Bangla document images is reported. At first, a modified Run Length Smoothing Algorithm (RLSA), called Spiral Run Length Smearing Algorithm (SRLSA), is applied for the extraction of words from the text lines of unconstrained handwritten Bangla document images. This technique has helped to overcom...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1703.10661  شماره 

صفحات  -

تاریخ انتشار 2017